[core] Optimization of Parquet Predicate Pushdown Capability #4608

Aiden-Dong · 2024-11-29T04:27:31Z

Purpose

Linked issue: #4586

优化了基于 Parquet 文件过滤读取时的谓词下推能力，将原先的Parquet 谓词下推由RowGroup级别增强到了 Column page 级别，查询性能提升明显。

Optimized the predicate pushdown capability for filtering and reading Parquet files, enhancing the original predicate pushdown from the RowGroup level to the Column Page level, resulting in a significant improvement in query performance.

Tests

API and Format

Documentation

JingsongLi · 2024-11-29T05:33:12Z

Looks very nice! Thanks @Aiden-Dong , I will take a review next week.

Aiden-Dong · 2024-11-29T11:07:33Z

我用[#4586] 提到测试样例，生成400万测试数据。
30次随机读取，原先需要12s-17s，优化后大概需要1s 左右。

Using the test case mentioned in [#4586], I generated 4 million test records.
With 30 random reads, the original implementation took around 12-17 seconds, while the optimized version reduced this to approximately 1 second.

before

after

JingsongLi · 2024-12-01T11:52:51Z

@Aiden-Dong Can you add test for parquet page predicate pushdown and deletion vectors enabled? I just want to make sure currentRowPosition in ParquetReaderFactory still works good.

JingsongLi · 2024-12-02T06:21:09Z

Spark Issue: https://issues.apache.org/jira/browse/SPARK-34859

Aiden-Dong · 2024-12-02T06:24:16Z

@Aiden-Dong Can you add test for parquet page predicate pushdown and deletion vectors enabled? I just want to make sure currentRowPosition in ParquetReaderFactory still works good.

确实这个 currentRowPosition 在 deletion vector 启用时索引存在问题，等我我修复一下这个功能，并加入测试
Indeed, the currentRowPosition has an indexing issue when the deletion vector is enabled. Let me fix this functionality and add tests.

Aiden-Dong · 2024-12-02T08:02:02Z

@JingsongLi

Based on the error log, it seems that this error is not likely caused by the recent changes.
从错误日志来看，感觉这个错误不太像是改动带来的。

JingsongLi · 2024-12-02T08:23:22Z

@JingsongLi

Based on the error log, it seems that this error is not likely caused by the recent changes. 从错误日志来看，感觉这个错误不太像是改动带来的。

You can enable action in your repo to double check.

leaves12138

+1 for non-test code

leaves12138 · 2024-12-03T02:44:31Z

paimon-core/src/test/java/org/apache/paimon/table/PrimaryKeyFileStoreTableTest.java

+                                .withIOManager(new IOManagerImpl(tempDir.toString()));
+
+        for (int i = 0; i < 2000; i++) {
+            write.write(rowData(i, i, i * 100L));


If you want test datas in one bucket in deletion vector, just rowData(0, i, i*100L), the first column is "pt" for partition .

for (int i = 0; i < 2000; i++) {
write.write(rowData(i, i, i * 100L));

The code above will write 2000 records in 2000 partitions.

leaves12138 · 2024-12-03T02:44:56Z

paimon-core/src/test/java/org/apache/paimon/table/PrimaryKeyFileStoreTableTest.java

+                                .newWrite()
+                                .withIOManager(new IOManagerImpl(tempDir.toString()));
+        for (int i = 1000; i < 2000; i++) {
+            write.write(rowDataWithKind(RowKind.DELETE, i, i, i * 100L));


same rowDataWithKind(RowKind.DELETE, 0, i, i * 100L)

leaves12138 · 2024-12-03T02:45:38Z

paimon-core/src/test/java/org/apache/paimon/table/PrimaryKeyFileStoreTableTest.java

+        List<Split> splits = toSplits(table.newSnapshotReader().read().dataSplits());
+
+        for (int i = 500; i < 510; i++) {
+            TableRead read = table.newRead().withFilter(builder.equal(0, i)).executeFilter();


maybe builder.equal(1, i) is what you want. Partition predicate will not push down.

leaves12138 · 2024-12-03T02:46:13Z

paimon-core/src/test/java/org/apache/paimon/table/PrimaryKeyFileStoreTableTest.java

+        }
+
+        for (int i = 1500; i < 1510; i++) {
+            TableRead read = table.newRead().withFilter(builder.equal(0, i)).executeFilter();


I misunderstood the first column as the primary key... I'll make the correction.
我把第一列理解成了主键。。我改一改

Aiden-Dong · 2024-12-03T03:19:57Z

I have revised the unit tests and added the write cardinality to ensure that there are more pages and RowGroups in the Parquet file.

ranxianglei · 2024-12-03T09:42:13Z

Thanks for the testing and PR. I downloaded and tested it immediately. Compared with the previous version, this PR improvement increased the speed of Parquet by nearly 10 times, which is a huge improvement!
I compared the orc implementation and considered the issue of testing jit.
My test results show that the current Parquet implementation is at least 1 times slower than orc.
感谢做的测试和pr。我第一时间下载了并且测试，相对之前的版本这个pr提升把Parquet的速度提升了接近10倍，非常大的提升！
我对比了orc实现，考虑了测试jit的问题。
我测试结果现在的Parquet实现比orc还慢至少1倍。

On my computer, the Parquet result is 8.4s and the ORC result is 4.1s.
在我电脑上Parquet结果是8.4s，ORC结果是4.1s。

@Aiden-Dong @JingsongLi @leaves12138

        Table table = TableUtil.getTable();   // PrimaryKeyFileStoreTable
        PredicateBuilder builder = new PredicateBuilder(
                RowType.of(DataTypes.INT(),
                        DataTypes.STRING(),
                        DataTypes.STRING()));

        int[] projection = new int[] {0, 1, 2};

        ReadBuilder readBuilder = table.newReadBuilder()
                .withProjection(projection);
        
        Random random = new Random();

        for(int i = 0 ; i < 30 ; i ++){
            InnerTableRead read = (InnerTableRead)readBuilder.newRead();
            int key = random.nextInt(4000000);

            Predicate keyFilter = builder.equal(0, key);

            InnerTableScan tableScan = (InnerTableScan) readBuilder
                    .withFilter(keyFilter)
                    .newScan();
            InnerTableScan innerTableScan = tableScan.withFilter(keyFilter);
            TableScan.Plan plan = innerTableScan.plan();
            List<Split> splits = plan.splits();

            read.withFilter(keyFilter);//.executeFilter();
            RecordReader<InternalRow> reader = read.createReader(splits);

            reader.forEachRemaining(internalRow -> {

                int f0 = internalRow.getInt(0);
                String f1 = internalRow.getString(1).toString();
                String f2 = internalRow.getString(2).toString();
                System.out.println(String.format("%d - {%d, %s, %s}",key, f0, f1, f2));
            });
        }
        long startTime = System.currentTimeMillis();
        for(int i = 0 ; i < 1000 ; i ++){
            InnerTableRead read = (InnerTableRead)readBuilder.newRead();
            int key = random.nextInt(4000000);

            Predicate keyFilter = builder.equal(0, key);

            InnerTableScan tableScan = (InnerTableScan) readBuilder
                    .withFilter(keyFilter)
                    .newScan();
            InnerTableScan innerTableScan = tableScan.withFilter(keyFilter);
            TableScan.Plan plan = innerTableScan.plan();
            List<Split> splits = plan.splits();

            read.withFilter(keyFilter);//.executeFilter();
            RecordReader<InternalRow> reader = read.createReader(splits);

            reader.forEachRemaining(internalRow -> {

                int f0 = internalRow.getInt(0);
                String f1 = internalRow.getString(1).toString();
                String f2 = internalRow.getString(2).toString();
                System.out.println(String.format("%d - {%d, %s, %s}",key, f0, f1, f2));
            });
        }
        long stopTime = System.currentTimeMillis();
        System.out.println("time : " + (stopTime - startTime));

writer see #4586

leaves12138 · 2024-12-03T09:50:08Z

Thanks @ranxianglei for testing, I will follow up this question.

Aiden-Dong · 2024-12-04T06:17:07Z

Thanks @ranxianglei for testing, I will follow up this question.

@leaves12138 @ranxianglei

I found that the default stripe size and row index stride of ORC are twice as large as those of Parquet, which leads to a finer granularity of indexes in ORC.

我发现 ORC 默认的 stripe 大小以及 row index stride 大小是 parquet 默认配置的两倍，这导致 ORC 的索引粒度更小

leaves12138 · 2024-12-04T07:40:22Z

@Aiden-Dong Hi, I still have some questions about the pull request. How can I get touch with you?

leaves12138 · 2024-12-04T07:42:32Z

This method return the nextIndex seems not correct.

leaves12138 · 2024-12-04T07:43:33Z

It just return the rowIndex in current row group. But what we want is the exactly row index within the whole parquet file.

long nextIndex = pageIndex + num;
    if (this.currentRowGroupReadState.currentRangeEnd() < nextIndex) {
        this.currentRowGroupReadState.nextRange();
        nextIndex = this.currentRowGroupReadState.currentRangeStart();
    }
return this.currentRowGroupFirstRowIndex + nextIndex;

Aiden-Dong · 2024-12-04T07:47:26Z

@Aiden-Dong 嗨，我对拉取请求还有一些疑问。我怎样才能联系到你们？

稍等我看一下这个逻辑。
我的微信 : aiden_dong_private

Aiden-Dong · 2024-12-04T08:10:39Z

I've submitted a PR to fix this logic ： #4636

leaves12138 · 2024-12-05T02:13:25Z

paimon-format/src/main/java/org/apache/paimon/format/parquet/reader/ShortColumnReader.java

+            switch (runLenDecoder.mode) {
+                case RLE:
+                    if (runLenDecoder.currentValue == maxDefLevel) {
+                        skipShot(n);


maybe skipShort?

…4608)

aiden and others added 3 commits November 29, 2024 12:19

add

a7e6908

add

40e21ac

add

8513f09

Aiden-Dong mentioned this pull request Nov 29, 2024

[Feature] In a Paimon primary key table, using ORC offers significantly higher efficiency for point lookups based on the primary key compared to Parquet. #4586

Closed

2 tasks

aiden added 2 commits December 2, 2024 15:09

fix bug and add test

78fdf3f

fix

86de427

leaves12138 approved these changes Dec 3, 2024

View reviewed changes

leaves12138 reviewed Dec 3, 2024

View reviewed changes

fix test

5cd0729

fix test

3382397

leaves12138 merged commit d33b871 into apache:master Dec 3, 2024
12 checks passed

Aiden-Dong mentioned this pull request Dec 4, 2024

[core] Fix parquet nextRowPosition bug #4636

Merged

leaves12138 reviewed Dec 5, 2024

View reviewed changes

ranxianglei pushed a commit to ranxianglei/faster-paimon that referenced this pull request Jan 2, 2025

[core] Optimization of Parquet Predicate Pushdown Capability (apache#…

2308475

…4608)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Optimization of Parquet Predicate Pushdown Capability #4608

[core] Optimization of Parquet Predicate Pushdown Capability #4608

Aiden-Dong commented Nov 29, 2024

JingsongLi commented Nov 29, 2024

Aiden-Dong commented Nov 29, 2024

JingsongLi commented Dec 1, 2024

JingsongLi commented Dec 2, 2024

Aiden-Dong commented Dec 2, 2024

Aiden-Dong commented Dec 2, 2024

JingsongLi commented Dec 2, 2024

leaves12138 left a comment •

edited

Loading

leaves12138 Dec 3, 2024

leaves12138 Dec 3, 2024

leaves12138 Dec 3, 2024

leaves12138 Dec 3, 2024

leaves12138 Dec 3, 2024

Aiden-Dong Dec 3, 2024

Aiden-Dong commented Dec 3, 2024

ranxianglei commented Dec 3, 2024

leaves12138 commented Dec 3, 2024 •

edited

Loading

Aiden-Dong commented Dec 4, 2024

leaves12138 commented Dec 4, 2024

leaves12138 commented Dec 4, 2024

leaves12138 commented Dec 4, 2024 •

edited

Loading

Aiden-Dong commented Dec 4, 2024

Aiden-Dong commented Dec 4, 2024

leaves12138 Dec 5, 2024

[core] Optimization of Parquet Predicate Pushdown Capability #4608

[core] Optimization of Parquet Predicate Pushdown Capability #4608

Conversation

Aiden-Dong commented Nov 29, 2024

Purpose

Tests

API and Format

Documentation

JingsongLi commented Nov 29, 2024

Aiden-Dong commented Nov 29, 2024

JingsongLi commented Dec 1, 2024

JingsongLi commented Dec 2, 2024

Aiden-Dong commented Dec 2, 2024

Aiden-Dong commented Dec 2, 2024

JingsongLi commented Dec 2, 2024

leaves12138 left a comment • edited Loading

Choose a reason for hiding this comment

leaves12138 Dec 3, 2024

Choose a reason for hiding this comment

leaves12138 Dec 3, 2024

Choose a reason for hiding this comment

leaves12138 Dec 3, 2024

Choose a reason for hiding this comment

leaves12138 Dec 3, 2024

Choose a reason for hiding this comment

leaves12138 Dec 3, 2024

Choose a reason for hiding this comment

Aiden-Dong Dec 3, 2024

Choose a reason for hiding this comment

Aiden-Dong commented Dec 3, 2024

ranxianglei commented Dec 3, 2024

leaves12138 commented Dec 3, 2024 • edited Loading

Aiden-Dong commented Dec 4, 2024

leaves12138 commented Dec 4, 2024

leaves12138 commented Dec 4, 2024

leaves12138 commented Dec 4, 2024 • edited Loading

Aiden-Dong commented Dec 4, 2024

Aiden-Dong commented Dec 4, 2024

leaves12138 Dec 5, 2024

Choose a reason for hiding this comment

leaves12138 left a comment •

edited

Loading

leaves12138 commented Dec 3, 2024 •

edited

Loading

leaves12138 commented Dec 4, 2024 •

edited

Loading